definition 5
From data to concepts via wiring diagrams
Lo, Jason, Jafari, Mohammadnima
A wiring diagram is a labeled directed graph that represents an abstract concept such as a temporal process. In this article, we introduce the notion of a quasi-skeleton wiring diagram graph, and prove that quasi-skeleton wiring diagram graphs correspond to Hasse diagrams. Using this result, we designed algorithms that extract wiring diagrams from sequential data. We used our algorithms in analyzing the behavior of an autonomous agent playing a computer game, and the algorithms correctly identified the winning strategies. We compared the performance of our main algorithm with two other algorithms based on standard clustering techniques (DBSCAN and agglomerative hierarchical), including when some of the data was perturbed. Overall, this article brings together techniques in category theory, graph theory, clustering, reinforcement learning, and data engineering.
- Health & Medicine (0.68)
- Leisure & Entertainment > Games > Computer Games (0.34)
Diffusion Models: A Mathematical Introduction
Maleki, Sepehr, Pourmoazemi, Negar
We present a concise, self-contained derivation of diffusion-based generative models. Starting from basic properties of Gaussian distributions (densities, quadratic expectations, re-parameterisation, products, and KL divergences), we construct denoising diffusion probabilistic models from first principles. This includes the forward noising process, its closed-form marginals, the exact discrete reverse posterior, and the related variational bound. This bound simplifies to the standard noise-prediction goal used in practice. We then discuss likelihood estimation and accelerated sampling, covering DDIM, adversarially learned reverse dynamics (DDGAN), and multi-scale variants such as nested and latent diffusion, with Stable Diffusion as a canonical example. A continuous-time formulation follows, in which we derive the probability-flow ODE from the diffusion SDE via the continuity and Fokker-Planck equations, introduce flow matching, and show how rectified flows recover DDIM up to a time re-parameterisation. Finally, we treat guided diffusion, interpreting classifier guidance as a posterior score correction and classifier-free guidance as a principled interpolation between conditional and unconditional scores. Throughout, the focus is on transparent algebra, explicit intermediate steps, and consistent notation, so that readers can both follow the theory and implement the corresponding algorithms in practice.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- (2 more...)
A our Work
Reddi et al. (2018) observed that Adam does not collect long-term memory of past gradients and therefore the effective learning rate could be increasing in some cases. Zaheer et al. (2018) proposed a more controlled increase in the effective learning rate by Our work is primarily theoretical in nature, but we discuss its broader impacts here. We follow the same proof strategy as Li et al. (2019). To apply Theorem B.2, the major condition to verify is that To prove Theorem B.2, we need the following lemma from Li et al. (2019). Under Condition (3) of Theorem B.2, given the initial point We also need the following lemma adapted from Lemma C.2, Li et al. (2021).
Optimistic Higher-Order Superposition
Bentkamp, Alexander, Blanchette, Jasmin, Hetzenberger, Matthias, Waldmann, Uwe
The $λ$-superposition calculus is a successful approach to proving higher-order formulas. However, some parts of the calculus are extremely explosive, notably due to the higher-order unifier enumeration and the functional extensionality axiom. In the present work, we introduce an "optimistic" version of $λ$-superposition that addresses these two issues. Specifically, our new calculus delays explosive unification problems using constraints stored along with the clauses, and it applies functional extensionality in a more targeted way. The calculus is sound and refutationally complete with respect to a Henkin semantics. We have yet to implement it in a prover, but examples suggest that it will outperform, or at least usefully complement, the original $λ$-superposition calculus.
- Europe > Austria > Vienna (0.13)
- Europe > Germany (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Truth-Aware Decoding: A Program-Logic Approach to Factual Language Generation
This paper introduces Truth-Aware Decoding (TAD), a verification-oriented decoding scheme that aligns neural language generation with knowledge bases. Situated in the tradition of probabilistic program semantics for sequence models, TAD augments modern instruction-tuned systems with a lattice of semantic guards that operate at decode time. Our contributions are fourfold: (i) a constraint-based semantics that renders oracle filtering as a program-logic judgment, (ii) a proof that greedy selection enjoys local likelihood dominance under sound and complete guards (Theorem 2.7), (iii) an entropy-style invariant that quantifies factual risk via knowledge-aware safe mass, and (iv) a multi-agent operational calculus with verified Lean artefacts to certify implementation behaviour. Numerical and algorithmic case studies confirm that the resulting guardrails reduce hallucinations without sacrificing throughput, yielding a pragmatic bridge between large-scale empirical models and formal verification.
Incoherence in goal-conditioned autoregressive models
Karwowski, Jacek, Douglas, Raymond
We investigate mathematically the notion of incoherence: a structural issue with reinforcement learning policies derived by naive goal-conditioning of autoregressive models. We focus on the process of re-training models on their own actions, that is, fine-tuning offline-learned policies with online RL. We prove that it decreases incoherence and leads to an improvement in return, and we aim to characterize the resulting trajectory of policies. By re-framing standard notions of control-as-inference and soft Q learning, we establish a three-way correspondence with two other ways of understanding the iterative re-training process: as folding the posterior into the reward and, in the deterministic case, as decreasing the temperature parameter; the correspondence has computational content via the training-inference trade-off. Through soft-conditioning generative models, we discuss the link between incoherence and the effective horizon.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.48)
Gaussian Mixture Model with unknown diagonal covariances via continuous sparse regularization
Giard, Romane, de Castro, Yohann, Marteau, Clément
This paper addresses the statistical estimation of Gaussian Mixture Models (GMMs) with unknown diagonal covariances from independent and identically distributed samples. We employ the Beurling-LASSO (BLASSO), a convex optimization framework that promotes sparsity in the space of measures, to simultaneously estimate the number of components and their parameters. Our main contribution extends the BLASSO methodology to multivariate GMMs with component-specific unknown diagonal covariance matrices-a significantly more flexible setting than previous approaches requiring known and identical covariances. We establish non-asymptotic recovery guarantees with nearly parametric convergence rates for component means, diagonal covariances, and weights, as well as for density prediction. A key theoretical contribution is the identification of an explicit separation condition on mixture components that enables the construction of non-degenerate dual certificates-essential tools for establishing statistical guarantees for the BLASSO. Our analysis leverages the Fisher-Rao geometry of the statistical model and introduces a novel semi-distance adapted to our framework, providing new insights into the interplay between component separation, parameter space geometry, and achievable statistical recovery.
- Europe > France (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)